VParC: a compression scheme for numeric data in column-oriented databases
نویسندگان
چکیده
Compression is one of the most important techniques in data management, which is usually used to improve the query efficiency in database. However, there are some restrictions on existing compression algorithms that have been applied to numeric data in column-oriented databases. First, a compression algorithm is suitable only for columns with certain data distributions not for all kinds of data columns; second, a data column with irregular distribution is hard to be compressed; third, the data column compressed by using heavyweight methods cannot be operated before decompression which leads to inefficient query. Based on the fact that it is more possible for a column to have sub-regularity than have global-regularity, we developed a compression scheme called Vertically Partitioning Compression (VParC). This method is suitable for columns with different data distributions, even for irregular columns in some cases. The more important thing is that data compressed by VParC can be operated directly without decompression in advance. Details of the compression and query evaluation approaches are presented in this paper and the results of our experiments demonstrate the promising features of VParC.
منابع مشابه
Data Compression in Database Query Processing
Row-oriented databases (or “row-store”) employ data compression methods (like dictionary encoding) to reduce the I/O cost by decreasing the data sizes. However, there are two limitations on row-stores when applying data compression schemes: (1) row-stores only allow encoding one single value at a time, and (2) they have to pay the decompression cost in query processing. The above shortcomings l...
متن کاملOptimizing Write Performance for Read Optimized Databases
Compression in column-oriented databases has been proven to offer both performance enhancements and reductions in storage consumption. This is especially true for read access as compressed data can directly be processed for query execution.Nevertheless, compression happens to be disadvantageous when it comes to write access due to unavoidable re-compression: write-access requires significantly ...
متن کاملBrighthouse: an analytic data warehouse for ad-hoc queries
Brighthouse is a column-oriented data warehouse with an automatically tuned, ultra small overhead metadata layer called Knowledge Grid, that is used as an alternative to classical indexes. The advantages of column-oriented data storage, as well as data compression have already been welldocumented, especially in the context of analytic, decision support querying. This paper demonstrates addition...
متن کاملA Column-Aware Index Management Using Flash Memory for Read-Intensive Databases
Most traditional database systems exploit a record-oriented model where the attributes of a record are placed contiguously in a hard disk to achieve high performance writes. However, for read-mostly data warehouse systems, the column-oriented database has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for ...
متن کاملCuttle: Enabling Cross-Column Compression in Distributed Column Stores
We observe that, in real-world distributed data warehouse systems, data columns from different sources often exhibit redundancy. Even though these systems can employ both general and column-oriented compression schemes to reduce the data storage pressure, such crosscolumn redundancy (CCR) is not recognized or exploited effectively. Therefore, we propose Cuttle, a column storage system that enab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 13 شماره
صفحات -
تاریخ انتشار 2016